Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 813 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 57.3 KiB |
| Average record size in memory | 72.2 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 1 |
df_index is highly correlated with N and 6 other fields | High correlation |
N is highly correlated with df_index and 5 other fields | High correlation |
P is highly correlated with df_index and 4 other fields | High correlation |
K is highly correlated with df_index and 7 other fields | High correlation |
humidity is highly correlated with df_index and 7 other fields | High correlation |
rainfall is highly correlated with df_index and 6 other fields | High correlation |
label is highly correlated with df_index and 7 other fields | High correlation |
temperature is highly correlated with df_index and 4 other fields | High correlation |
ph is highly correlated with K and 3 other fields | High correlation |
df_index has unique values | Unique |
Reproduction
| Analysis started | 2023-03-14 18:37:56.906504 |
|---|---|
| Analysis finished | 2023-03-14 18:38:10.638439 |
| Duration | 13.73 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 813 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 628.9766298 |
| Minimum | 0 |
|---|---|
| Maximum | 1640 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 40.6 |
| Q1 | 203 |
| median | 406 |
| Q3 | 1109 |
| 95-th percentile | 1577.4 |
| Maximum | 1640 |
| Range | 1640 |
| Interquartile range (IQR) | 906 |
Descriptive statistics
| Standard deviation | 519.214069 |
|---|---|
| Coefficient of variation (CV) | 0.8254902399 |
| Kurtosis | -1.035903468 |
| Mean | 628.9766298 |
| Median Absolute Deviation (MAD) | 313 |
| Skewness | 0.6289047422 |
| Sum | 511358 |
| Variance | 269583.2495 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | 0.1% |
| 946 | 1 | 0.1% |
| 936 | 1 | 0.1% |
| 937 | 1 | 0.1% |
| 938 | 1 | 0.1% |
| 939 | 1 | 0.1% |
| 940 | 1 | 0.1% |
| 941 | 1 | 0.1% |
| 942 | 1 | 0.1% |
| 943 | 1 | 0.1% |
| Other values (803) | 803 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 1640 | 1 | |
| 1639 | 1 | |
| 1638 | 1 | |
| 1637 | 1 | |
| 1636 | 1 | |
| 1635 | 1 | |
| 1634 | 1 | |
| 1633 | 1 | |
| 1632 | 1 | |
| 1631 | 1 |
| Distinct | 100 |
|---|---|
| Distinct (%) | 12.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 42.52521525 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 8 |
| Zeros (%) | 1.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 22 |
| median | 35 |
| Q3 | 67 |
| 95-th percentile | 93 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 45 |
Descriptive statistics
| Standard deviation | 28.35881834 |
|---|---|
| Coefficient of variation (CV) | 0.6668706595 |
| Kurtosis | -1.010028628 |
| Mean | 42.52521525 |
| Median Absolute Deviation (MAD) | 22 |
| Skewness | 0.4595782639 |
| Sum | 34573 |
| Variance | 804.2225777 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 27 | 26 | 3.2% |
| 60 | 19 | 2.3% |
| 32 | 19 | 2.3% |
| 22 | 18 | 2.2% |
| 35 | 18 | 2.2% |
| 40 | 18 | 2.2% |
| 28 | 18 | 2.2% |
| 25 | 17 | 2.1% |
| 37 | 17 | 2.1% |
| 31 | 16 | 2.0% |
| Other values (90) | 627 |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 1 | 8 | |
| 2 | 9 | |
| 3 | 10 | |
| 4 | 4 | 0.5% |
| 5 | 12 | |
| 6 | 14 | |
| 7 | 10 | |
| 8 | 9 | |
| 9 | 13 |
| Value | Count | Frequency (%) |
| 100 | 1 | 0.1% |
| 99 | 10 | |
| 98 | 6 | |
| 97 | 5 | |
| 96 | 2 | 0.2% |
| 95 | 6 | |
| 94 | 7 | |
| 93 | 8 | |
| 92 | 5 | |
| 91 | 9 |
| Distinct | 72 |
|---|---|
| Distinct (%) | 8.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 77.63837638 |
| Minimum | 35 |
|---|---|
| Maximum | 145 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.5 KiB |
Quantile statistics
| Minimum | 35 |
|---|---|
| 5-th percentile | 38 |
| Q1 | 55 |
| median | 66 |
| Q3 | 80 |
| 95-th percentile | 141 |
| Maximum | 145 |
| Range | 110 |
| Interquartile range (IQR) | 25 |
Descriptive statistics
| Standard deviation | 33.82972581 |
|---|---|
| Coefficient of variation (CV) | 0.4357345862 |
| Kurtosis | -0.6805364156 |
| Mean | 77.63837638 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 0.8817401804 |
| Sum | 63120 |
| Variance | 1144.450348 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 60 | 33 | 4.1% |
| 72 | 26 | 3.2% |
| 58 | 26 | 3.2% |
| 55 | 24 | 3.0% |
| 56 | 22 | 2.7% |
| 59 | 21 | 2.6% |
| 57 | 20 | 2.5% |
| 67 | 18 | 2.2% |
| 62 | 18 | 2.2% |
| 35 | 17 | 2.1% |
| Other values (62) | 588 |
| Value | Count | Frequency (%) |
| 35 | 17 | |
| 36 | 8 | |
| 37 | 7 | |
| 38 | 11 | |
| 39 | 7 | |
| 40 | 6 | 0.7% |
| 41 | 10 | |
| 42 | 7 | |
| 43 | 11 | |
| 44 | 14 |
| Value | Count | Frequency (%) |
| 145 | 8 | |
| 144 | 12 | |
| 143 | 10 | |
| 142 | 7 | |
| 141 | 7 | |
| 140 | 12 | |
| 139 | 12 | |
| 138 | 10 | |
| 137 | 7 | |
| 136 | 10 |
| Distinct | 44 |
|---|---|
| Distinct (%) | 5.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 77.2902829 |
| Minimum | 15 |
|---|---|
| Maximum | 205 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.5 KiB |
Quantile statistics
| Minimum | 15 |
|---|---|
| 5-th percentile | 16 |
| Q1 | 21 |
| median | 39 |
| Q3 | 85 |
| 95-th percentile | 203 |
| Maximum | 205 |
| Range | 190 |
| Interquartile range (IQR) | 64 |
Descriptive statistics
| Standard deviation | 73.14222508 |
|---|---|
| Coefficient of variation (CV) | 0.9463314447 |
| Kurtosis | -0.8614120092 |
| Mean | 77.2902829 |
| Median Absolute Deviation (MAD) | 22 |
| Skewness | 0.9393231055 |
| Sum | 62837 |
| Variance | 5349.78509 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=44)
| Value | Count | Frequency (%) |
| 17 | 41 | 5.0% |
| 19 | 38 | 4.7% |
| 22 | 37 | 4.6% |
| 23 | 35 | 4.3% |
| 20 | 34 | 4.2% |
| 21 | 32 | 3.9% |
| 18 | 30 | 3.7% |
| 24 | 27 | 3.3% |
| 16 | 26 | 3.2% |
| 197 | 24 | 3.0% |
| Other values (34) | 489 |
| Value | Count | Frequency (%) |
| 15 | 22 | |
| 16 | 26 | |
| 17 | 41 | |
| 18 | 30 | |
| 19 | 38 | |
| 20 | 34 | |
| 21 | 32 | |
| 22 | 37 | |
| 23 | 35 | |
| 24 | 27 |
| Value | Count | Frequency (%) |
| 205 | 18 | |
| 204 | 22 | |
| 203 | 22 | |
| 202 | 14 | |
| 201 | 18 | |
| 200 | 14 | |
| 199 | 14 | |
| 198 | 15 | |
| 197 | 24 | |
| 196 | 21 |
| Distinct | 788 |
|---|---|
| Distinct (%) | 96.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.58808318 |
| Minimum | 8.825674745 |
|---|---|
| Maximum | 41.94865736 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.5 KiB |
Quantile statistics
| Minimum | 8.825674745 |
|---|---|
| 5-th percentile | 16.51783455 |
| Q1 | 19.33162606 |
| median | 22.05592283 |
| Q3 | 24.71417533 |
| 95-th percentile | 33.37036145 |
| Maximum | 41.94865736 |
| Range | 33.12298261 |
| Interquartile range (IQR) | 5.38254927 |
Descriptive statistics
| Standard deviation | 5.047262483 |
|---|---|
| Coefficient of variation (CV) | 0.2234480209 |
| Kurtosis | 2.186847293 |
| Mean | 22.58808318 |
| Median Absolute Deviation (MAD) | 2.71827207 |
| Skewness | 0.9893502923 |
| Sum | 18364.11163 |
| Variance | 25.47485857 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 19.52226241 | 2 | 0.2% |
| 23.46168338 | 2 | 0.2% |
| 20.76952209 | 2 | 0.2% |
| 22.64236876 | 2 | 0.2% |
| 15.53834801 | 2 | 0.2% |
| 17.00067625 | 2 | 0.2% |
| 19.91853092 | 2 | 0.2% |
| 18.15300153 | 2 | 0.2% |
| 20.16080524 | 2 | 0.2% |
| 21.53989176 | 2 | 0.2% |
| Other values (778) | 793 |
| Value | Count | Frequency (%) |
| 8.825674745 | 1 | |
| 9.467960445 | 1 | |
| 9.535585543 | 1 | |
| 9.724457611 | 1 | |
| 9.851242629 | 1 | |
| 9.949929082 | 1 | |
| 10.38004759 | 1 | |
| 10.72302459 | 1 | |
| 10.89875873 | 1 | |
| 11.02105378 | 1 |
| Value | Count | Frequency (%) |
| 41.94865736 | 1 | |
| 41.65602996 | 1 | |
| 41.36106301 | 1 | |
| 41.20733624 | 1 | |
| 41.18664903 | 1 | |
| 40.66012294 | 1 | |
| 39.70772192 | 1 | |
| 39.64851881 | 1 | |
| 39.37102553 | 1 | |
| 39.06555518 | 1 |
| Distinct | 769 |
|---|---|
| Distinct (%) | 94.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 56.96614304 |
| Minimum | 14.25803981 |
|---|---|
| Maximum | 94.92048112 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.5 KiB |
Quantile statistics
| Minimum | 14.25803981 |
|---|---|
| 5-th percentile | 15.80224812 |
| Q1 | 22.33195853 |
| median | 65.34583901 |
| Q3 | 82.45687182 |
| 95-th percentile | 92.72604913 |
| Maximum | 94.92048112 |
| Range | 80.66244131 |
| Interquartile range (IQR) | 60.12491329 |
Descriptive statistics
| Standard deviation | 28.78614121 |
|---|---|
| Coefficient of variation (CV) | 0.5053201722 |
| Kurtosis | -1.563963712 |
| Mean | 56.96614304 |
| Median Absolute Deviation (MAD) | 19.41335527 |
| Skewness | -0.3036296928 |
| Sum | 46313.47429 |
| Variance | 828.641926 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 20.4555596 | 2 | 0.2% |
| 24.25386207 | 2 | 0.2% |
| 19.68751084 | 2 | 0.2% |
| 57.68272924 | 2 | 0.2% |
| 24.54038287 | 2 | 0.2% |
| 24.96881755 | 2 | 0.2% |
| 18.93146941 | 2 | 0.2% |
| 66.50415474 | 2 | 0.2% |
| 23.75560241 | 2 | 0.2% |
| 23.22197648 | 2 | 0.2% |
| Other values (759) | 793 |
| Value | Count | Frequency (%) |
| 14.25803981 | 1 | |
| 14.27327988 | 1 | |
| 14.2804191 | 1 | |
| 14.32313811 | 1 | |
| 14.33847406 | 1 | |
| 14.42457525 | 1 | |
| 14.44008871 | 1 | |
| 14.44228303 | 1 | |
| 14.62313811 | 1 | |
| 14.69765308 | 1 |
| Value | Count | Frequency (%) |
| 94.92048112 | 1 | |
| 94.89613443 | 1 | |
| 94.76285385 | 1 | |
| 94.73763514 | 1 | |
| 94.71203306 | 1 | |
| 94.67695747 | 1 | |
| 94.58900601 | 1 | |
| 94.58075845 | 1 | |
| 94.5764581 | 1 | |
| 94.54128292 | 1 |
| Distinct | 758 |
|---|---|
| Distinct (%) | 93.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.308703046 |
| Minimum | 4.548202098 |
|---|---|
| Maximum | 8.967057762 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.5 KiB |
Quantile statistics
| Minimum | 4.548202098 |
|---|---|
| 5-th percentile | 5.279523912 |
| Q1 | 5.732453638 |
| median | 6.112305667 |
| Q3 | 6.655918078 |
| 95-th percentile | 7.98156501 |
| Maximum | 8.967057762 |
| Range | 4.418855664 |
| Interquartile range (IQR) | 0.92346444 |
Descriptive statistics
| Standard deviation | 0.8365354135 |
|---|---|
| Coefficient of variation (CV) | 0.1326002203 |
| Kurtosis | 0.6629709333 |
| Mean | 6.308703046 |
| Median Absolute Deviation (MAD) | 0.422239979 |
| Skewness | 0.9210569124 |
| Sum | 5128.975576 |
| Variance | 0.6997914981 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7.152811172 | 2 | 0.2% |
| 7.228963452 | 2 | 0.2% |
| 7.690962338 | 2 | 0.2% |
| 5.509295379 | 2 | 0.2% |
| 6.481783043 | 2 | 0.2% |
| 5.945465949 | 2 | 0.2% |
| 6.515499549 | 2 | 0.2% |
| 6.391173589 | 2 | 0.2% |
| 5.988992796 | 2 | 0.2% |
| 5.502999119 | 2 | 0.2% |
| Other values (748) | 793 |
| Value | Count | Frequency (%) |
| 4.548202098 | 1 | |
| 4.567446499 | 1 | |
| 4.603563116 | 1 | |
| 4.608695247 | 1 | |
| 4.672437054 | 1 | |
| 4.674941549 | 1 | |
| 4.681576043 | 1 | |
| 4.684079249 | 1 | |
| 4.696518678 | 1 | |
| 4.697750704 | 1 |
| Value | Count | Frequency (%) |
| 8.967057762 | 1 | |
| 8.931756558 | 1 | |
| 8.868741443 | 1 | |
| 8.861479668 | 1 | |
| 8.829273328 | 1 | |
| 8.766128654 | 1 | |
| 8.753795334 | 2 | |
| 8.736337905 | 1 | |
| 8.719960893 | 1 | |
| 8.718192847 | 2 |
| Distinct | 788 |
|---|---|
| Distinct (%) | 96.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 111.584416 |
| Minimum | 5.31450727 |
|---|---|
| Maximum | 298.5601175 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.5 KiB |
Quantile statistics
| Minimum | 5.31450727 |
|---|---|
| 5-th percentile | 60.501239 |
| Q1 | 73.0926704 |
| median | 96.65888933 |
| Q3 | 125.0972687 |
| 95-th percentile | 245.3557502 |
| Maximum | 298.5601175 |
| Range | 293.2456102 |
| Interquartile range (IQR) | 52.0045983 |
Descriptive statistics
| Standard deviation | 59.42186943 |
|---|---|
| Coefficient of variation (CV) | 0.5325283901 |
| Kurtosis | 1.174180182 |
| Mean | 111.584416 |
| Median Absolute Deviation (MAD) | 25.33935863 |
| Skewness | 1.119103291 |
| Sum | 90718.13024 |
| Variance | 3530.958567 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 95.17028129 | 2 | 0.2% |
| 125.0972687 | 2 | 0.2% |
| 75.45328039 | 2 | 0.2% |
| 122.3886015 | 2 | 0.2% |
| 62.21292186 | 2 | 0.2% |
| 107.6907964 | 2 | 0.2% |
| 103.2926407 | 2 | 0.2% |
| 113.334026 | 2 | 0.2% |
| 95.84253438 | 2 | 0.2% |
| 105.4120514 | 2 | 0.2% |
| Other values (778) | 793 |
| Value | Count | Frequency (%) |
| 5.31450727 | 1 | |
| 5.370175667 | 1 | |
| 5.408681786 | 1 | |
| 5.686167788 | 1 | |
| 5.861398642 | 1 | |
| 6.00080568 | 1 | |
| 6.018627178 | 1 | |
| 6.124709117 | 1 | |
| 6.15393208 | 1 | |
| 6.250660556 | 1 |
| Value | Count | Frequency (%) |
| 298.5601175 | 1 | |
| 298.4018471 | 1 | |
| 295.9248796 | 1 | |
| 295.6094492 | 1 | |
| 291.2986618 | 1 | |
| 290.6793783 | 1 | |
| 287.5766935 | 1 | |
| 286.5083725 | 1 | |
| 285.2493645 | 1 | |
| 284.4364567 | 1 |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.5 KiB |
| rice | |
|---|---|
| Soyabeans | |
| beans | |
| maize | |
| peas | |
| Other values (2) |
Length
| Max length | 9 |
|---|---|
| Median length | 6 |
| Mean length | 5.468634686 |
| Min length | 4 |
Characters and Unicode
| Total characters | 4446 |
|---|---|
| Distinct characters | 16 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | rice |
|---|---|
| 2nd row | rice |
| 3rd row | rice |
| 4th row | rice |
| 5th row | rice |
Common Values
| Value | Count | Frequency (%) |
| rice | 139 | |
| Soyabeans | 130 | |
| beans | 125 | |
| maize | 119 | |
| peas | 100 | |
| apple | 100 | |
| grapes | 100 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| rice | 139 | |
| soyabeans | 130 | |
| beans | 125 | |
| maize | 119 | |
| peas | 100 | |
| apple | 100 | |
| grapes | 100 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 813 | |
| a | 804 | |
| s | 455 | |
| p | 400 | |
| i | 258 | 5.8% |
| b | 255 | 5.7% |
| n | 255 | 5.7% |
| r | 239 | 5.4% |
| c | 139 | 3.1% |
| S | 130 | 2.9% |
| Other values (6) | 698 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4316 | |
| Uppercase Letter | 130 | 2.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 813 | |
| a | 804 | |
| s | 455 | |
| p | 400 | |
| i | 258 | 6.0% |
| b | 255 | 5.9% |
| n | 255 | 5.9% |
| r | 239 | 5.5% |
| c | 139 | 3.2% |
| o | 130 | 3.0% |
| Other values (5) | 568 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 130 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 4446 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 813 | |
| a | 804 | |
| s | 455 | |
| p | 400 | |
| i | 258 | 5.8% |
| b | 255 | 5.7% |
| n | 255 | 5.7% |
| r | 239 | 5.4% |
| c | 139 | 3.1% |
| S | 130 | 2.9% |
| Other values (6) | 698 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 813 | |
| a | 804 | |
| s | 455 | |
| p | 400 | |
| i | 258 | 5.8% |
| b | 255 | 5.7% |
| n | 255 | 5.7% |
| r | 239 | 5.4% |
| c | 139 | 3.1% |
| S | 130 | 2.9% |
| Other values (6) | 698 |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | N | P | K | temperature | humidity | ph | rainfall | label | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 90 | 42 | 43 | 20.879744 | 82.002744 | 6.502985 | 202.935536 | rice |
| 1 | 1 | 85 | 58 | 41 | 21.770462 | 80.319644 | 7.038096 | 226.655537 | rice |
| 2 | 2 | 60 | 55 | 44 | 23.004459 | 82.320763 | 7.840207 | 263.964248 | rice |
| 3 | 3 | 74 | 35 | 40 | 26.491096 | 80.158363 | 6.980401 | 242.864034 | rice |
| 4 | 4 | 78 | 42 | 42 | 20.130175 | 81.604873 | 7.628473 | 262.717340 | rice |
| 5 | 5 | 69 | 37 | 42 | 23.058049 | 83.370118 | 7.073454 | 251.055000 | rice |
| 6 | 6 | 69 | 55 | 38 | 22.708838 | 82.639414 | 5.700806 | 271.324860 | rice |
| 7 | 7 | 94 | 53 | 40 | 20.277744 | 82.894086 | 5.718627 | 241.974195 | rice |
| 8 | 8 | 89 | 54 | 38 | 24.515881 | 83.535216 | 6.685346 | 230.446236 | rice |
| 9 | 9 | 68 | 58 | 38 | 23.223974 | 83.033227 | 6.336254 | 221.209196 | rice |
Last rows
| df_index | N | P | K | temperature | humidity | ph | rainfall | label | |
|---|---|---|---|---|---|---|---|---|---|
| 803 | 1631 | 0 | 65 | 15 | 23.461683 | 23.221976 | 5.645436 | 95.842534 | beans |
| 804 | 1632 | 13 | 72 | 21 | 24.321166 | 21.027867 | 5.821194 | 60.275525 | beans |
| 805 | 1633 | 34 | 60 | 23 | 20.125741 | 24.969699 | 5.659255 | 100.049718 | beans |
| 806 | 1634 | 9 | 80 | 19 | 21.806196 | 18.570866 | 5.945466 | 125.097269 | beans |
| 807 | 1635 | 11 | 72 | 20 | 19.522262 | 24.926072 | 5.951177 | 113.334026 | beans |
| 808 | 1636 | 3 | 67 | 24 | 17.000676 | 19.907905 | 5.520880 | 103.292641 | beans |
| 809 | 1637 | 35 | 69 | 23 | 16.787915 | 24.968818 | 5.578410 | 75.453280 | beans |
| 810 | 1638 | 3 | 77 | 25 | 24.849062 | 22.894646 | 5.608165 | 62.212922 | beans |
| 811 | 1639 | 23 | 62 | 19 | 16.517835 | 20.455560 | 5.609435 | 98.777942 | beans |
| 812 | 1640 | 22 | 71 | 17 | 18.153002 | 19.386021 | 5.509295 | 107.690796 | beans |